Semi-Markov Decision Processes
ثبت نشده
چکیده
The previous chapter dealt with the discrete-time Markov decision model. In this model, decisions can be made only at fixed epochs t = 0, 1, . . . . However, in many stochastic control problems the times between the decision epochs are not constant but random. A possible tool for analysing such problems is the semiMarkov decision model. In Section 7.1 we discuss the basic elements of this model. Also, for the optimality criterion of the long-run average cost per time unit, we give a data-transformation method by which the semi-Markov decision model can be converted into an equivalent discrete-time Markov decision model. The datatransformation method enables us to apply the recursive method of value-iteration to the semi-Markov decision model. Section 7.2 summarizes various algorithms for the computation of an average cost optimal policy. In Section 7.3 we discuss the value-iteration algorithm for a semi-Markov decision model in which the times between the decision epochs are exponentially distributed. For this particular case the computational effort of the value-iteration algorithm can considerably be reduced by introducing fictitious decision epochs. This simple trick creates sparse transition matrices leading to a much more effective value-iteration algorithm. Section 7.4 illustrates how value iteration in combination with an embedding idea can be used in the optimization of queues. The semi-Markov decision model is a very useful tool for optimal control in queueing systems. In Section 7.5 we will exploit a remarkable feature of the policy-iteration algorithm, namely that the algorithm typically achieves its largest improvements in costs in the first few iterations. This finding is sometimes useful to attack the curse of dimensionality in applications with a multidimensional state space. The idea is to determine first the relative values for a reasonable starting policy and to apply next a single policy-improvement step. This heuristic approach will be illustrated to a dynamic routing problem.
منابع مشابه
Existence of Optimal Policies for Semi-Markov Decision Processes Using Duality for Infinite Linear Programming
Semi-Markov decision processes on Borel spaces with deterministic kernels have many practical applications, particularly in inventory theory. Most of the results from general semi-Markov decision processes do not carry over to a deterministic kernel since such a kernel does not provide “smoothness.” We develop infinite dimensional linear programming theory for a general stochastic semi-Markov d...
متن کاملSemi-markov Decision Processes
Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملRegret optimality in semi-Markov decision processes with an absorbing set
The optimization problem of general utility case is considered for countable state semi-Markov decision processes. The regret-utility function is introduced as a function of two variables, one is a target value and the other is a present value. We consider the expectation of the regret-utility function incured until the reaching time to a given absorbing set. In order to characterize the regret...
متن کاملSolving Generalized Semi-Markov Processes using Continuous Phase-Type Distributions
We introduce the generalized semi-Markov decision process (GSMDP) as an extension of continuous-time MDPs and semi-Markov decision processes (SMDPs) for modeling stochastic decision processes with asynchronous events and actions. Using phase-type distributions and uniformization, we show how an arbitrary GSMDP can be approximated by a discrete-time MDP, which can then be solved using existing M...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004